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Within the next few years, Advanced LIGO and Virgo should detect gravitational waves from 
binary neutron star and neutron star-black hole mergers. These sources are also predicted to power 
a broad array of electromagnetic transients. Because the electromagnetic signatures can be faint 
and fade rapidly, observing them hinges on rapidly inferring the sky location from the gravitational 
wave observations. Markov chain Monte Carlo methods for gravitational-wave parameter estimation 
can take hours or more. We introduce BAYESTAR, a rapid, Bayesian, non-Markov chain Monte 
Carlo sky localization algorithm that takes just seconds to produce probability sky maps that are 
comparable in accuracy to the full analysis. Prompt localizations from BAYESTAR will make it 
possible to search electromagnetic counterparts of compact binary mergers. 

PACS numbers: 04.80.Nn, 04.30.Tv, 02.50.Tt 


The Laser Interferometer Gravitational Wave Obser¬ 
vatory (LIGO) [I, 2] has just begun taking data [3] in 
its “Advanced” configuration. The two LIGO detectors 
will ultimately increase their reach in volume within the 
local Universe by 3 orders of magnitude as compared to 
their initial configurations through 2010. They form the 
first parts of a sensitive global gravitational-wave (GW) 
detector network, soon to be augmented by Advanced 
Virgo [41 and later by the Japanese KAGRA facility [5, G] 
and LIGO-India [7]. 

The most readily detectable sources of GWs include bi¬ 
nary neutron star mergers, with 0.4-400 events per year 
within the reach of Advanced LIGO at its final design 
sensitivity [8] . These binary systems are not only efficient 
sources of GWs but also potential sources of detectable 
electromagnetic (EM) transients from the aftermath of 
the tidal disruption of the neutron stars (NSs). Met¬ 
zger and Berger [9] argue that the most promising EM 
counterparts are the hypothesized optical/infrared “kilo- 
novae” powered by the radioactive decay of r-process ele¬ 
ments synthesized within the neutron-rich ejecta. These 
are expected to be faint and red and to peak rapidly, 
reaching an absolute magnitude of only Mji ^ —13 
within a week, though rising several magnitudes brighter 
in the infrared [10]. 

Several mechanisms could make the kilonovae brighter, 
bluer, and hence more readily detectable [11, 12], but 
peak even earlier, within hours. If, as is widely believed 
[13-16], binary neutron star (BNS) mergers are indeed 
progenitors of short gamma-ray bursts (GRBs), then a 
small (due to jet collimation) fraction of Advanced LIGO 
events could also be accompanied by a bright optical af¬ 
terglow, but this signature, likewise, would peak within 
hours or faster. 

Adding to the challenge of detecting a faint, short-lived 
optical transient, there is an extreme mismatch between 
the sky localization accuracy of GW detector networks, 
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~10-500deg^ [17-27], and the fields of view (FOVs) of 
l-8m-class optical telescopes. Wide-field optical tran¬ 
sient facilities such as BlackGEM (0.6m/2.7deg^), the 
Zwicky Transient Facility (1.2 m/47 deg^) [28], the Dark 
Energy Camera (4m/3deg^), or the Large Synoptic Sur¬ 
vey Telescope (8.4 m/9.6 deg^), operated in “target of op¬ 
portunity” mode, may be able to tile these large areas 
rapidly enough to find the one needle in the haystack 
that is connected to the GW event. However, prompt 
and accurate GW position reconstructions will be of the 
utmost importance for guiding the selection of fields to 
observe. 

The final science run of the initial LIGO and Virgo in¬ 
struments saw the first joint search for GW and EM emis¬ 
sion from compact binaries. This involved several ad¬ 
vances in the GW data analysis [29] , including the first re¬ 
al-time matched-filter detection pipeline, the Multi-Band 
Template Analysis [30]; a semi-coherent, ad hoc rapid 
triangulation code, Timing-1—k; and the first version of a 
rigorous Bayesian Markov chain Monte Carlo (MCMC) 
parameter estimation code, LALINFERENCE [31]— all 
in service of the first search for X-ray [32] and optical 
[33] counterparts of GW triggers, by a consortium of fa¬ 
cilities. Despite the technical achievements in the GW 
data analysis, there was an undesirable tradeoff between 
the speed as well as accuracy of the rapid localization 
and the full parameter estimation: the former could ana¬ 
lyze a detection candidate in minutes, whereas the latter 
took half a day; the latter decreased the area on the sky 
by a factor of 1/20 over the former but took 1000 times 
as long to run [26]. 

The success of EM follow-up of LIGO events will de¬ 
pend critically on disseminating high quality sky local¬ 
izations within a time scale of minutes to hours. To that 
end, we have devised a rapid and accurate Bayesian sky 
localization method that takes mere seconds to achieve 
approximately the same accuracy as the full MGMG anal¬ 
ysis. Our key insights are the following: 

1. Nearly all of the information in the GW time series 
that is informative for sky localization is encap¬ 
sulated within the matched-filter estimates of the 
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times, amplitudes, and phases on arrival at the de¬ 
tectors. To infer the position and distance of a GW 
event, we only have to consider three numbers per 
detector rather than a densely sampled strain time 
series per detector. 

2. The matched-filter pipeline can be treated as a 
measurement system in and of itself. Just like the 
strain time series from the detectors, the resultant 
times, amplitudes, and phases have a predictable 
and quantifiable measurement uncertainty that can 
be translated into a likelihood function suitable for 
Bayesian inference. 

3. The Fisher information matrix will provide clues as 
to suitable forms of this likelihood function. Recent 
GW parameter estimation literature has largely re¬ 
jected the Fisher matrix,^ but this is mostly on the 
grounds of the abuse of the related Gramer-Rao 
lower bound (GRLB) outside its realm of validity 
of moderate to high signal-to-noise ratio (S/N) [35- 
38] . However, we recognize that the block structure 
of the Fisher matrix provides important insights 
and is a useful quantity for checking the validity of 
the aforementioned likelihood function, quite inde¬ 
pendent of the GRLB. 

4. The Fisher matrix teaches us that errors in sky 
localization are semi-independent from errors in 
masses. This implies that if we care only about 
position reconstruction and not about jointly esti¬ 
mating masses as well, then we can reduce the di¬ 
mensionality of the parameter estimation problem 
significantly. Moreover, this frees us of the need 
to directly compute the expensive post-Newtonian 
model waveforms, making the likelihood itself much 
faster to evaluate. 

5. Thanks to a simple likelihood function and a 
well-characterized parameter space, we may dis¬ 
pense with costly and parallelization-resistant 
MCMG integration and instead perform the 
Bayesian marginalization with classic, determinis¬ 
tic, very low order Gaussian quadrature. 

6. The Bayesian inference scheme thus designed to op¬ 
erate on the matched-filter detection pipeline out¬ 
puts could be trivially generalized to operate on 
the full GW time series within the same compu¬ 
tational constraints. This would yield a fast and 
coherent localization algorithm that is mathemati¬ 
cally equivalent to the full MCMG parameter esti¬ 
mation, restricted to extrinsic parameters (sky lo¬ 
cation, binary orientation, and distance). 


^ Though not entirely; see [.'14]. 


We call this algorithm BAYESian TriAngulation and 
Rapid localization (BAYESTAR).^^ It is as fast as Tim¬ 
ing-1—I- but nearly as accurate as the rigorous full pa¬ 
rameter estimation. It is unique in that it bridges the 
detection and parameter estimation of GW signals, two 
tasks that have until now involved very different numer¬ 
ical methods and time scales. Beginning with the first 
Advanced LIGO observing run, BAYESTAR is provid¬ 
ing localizations within minutes of the detection of any 
BNS merger candidate, playing a key role in enabling 
rapid follow-up observations. 

This paper is organized as follows. In Sec. I, we de¬ 
scribe the GW signal model and sketch the standard de¬ 
tection algorithm, the matched-hlter bank. In Sec. II, 
we describe Bayesian inference formalism and the pre¬ 
vailing method for inferring the parameters of detected 
candidates, MCMG sampling. In Sec. Ill, we propose the 
BAYESTAR likelihood as a model for the uncertainty in 
the matched-filter parameter estimates, and discuss its 
relationship to and consistency with the likelihood for 
the full GW data. In Sec. IV, we describe the input to 
BAYESTAR supplied by the detection pipeline, and the 
prior distribution on parameters. In Sec. V, we explain 
the integration scheme by which the posterior probabil¬ 
ity distribution is calculated for a given sky location. In 
Sec. VI, we show a scheme whereby the sky posterior 
is sampled on an adaptive Hierarchical Equal Area iso- 
Latitude Pixelization (HEALPix) grid. In Sec. VHIE, we 
report the running time of the algorithm on the hardware 
available on the LIGO Data Grid. Finally, in Sec. VHI, 
we quantify the sky localization performance on a com¬ 
prehensive set of simulated GW events. 


I. SIGNAL MODEL AND DETECTION 


In the time domain, the strain observed by a single 
GW interferometer is 


yi{t) = +ni(t). 

In the frequency domain. 


( 1 ) 


/ OO 

y{t)e-^^^dt = 6) + A,(a;), (2) 

-OO 

where Yi(a;; 0 ) is the GW signal given a parameter vec¬ 
tor 6 that describes the GW source and Ni{uj) is that 
detector’s Gaussian noise with one-sided power spectral 

density (PSD) Si{u}) = E \Ni{uj)f + E \Ni{—u})\^ = 


^ A pun on the Cylon battleships in the American television series 
Battlestar Galactica. The defining characteristic of the Cylons is 
that they repeatedly defeat humanity by using their superhuman 
information-gathering ability to coordinate overwhelming forces. 

^ We do not like to mention the final ‘L’ in the acronym, because 
then it would be pronounced BAYESTARL, which sounds stupid. 








3 


2E 


i^.Hr 


We shall denote the combined observa¬ 


tion from a network of detectors as Y(w) = {Yi{uj)}i. 

Under the assumptions that the detector noise is Gaus¬ 
sian and that the noise from different detectors is uncor¬ 
related, the likelihood of the observation, y, conditioned 
upon the parameters 0, is a product of Gaussian distri¬ 
butions: 


C{Y-e) = l[p{Y,\6) 

i 


oc exp 


2^ Jo S^ioj) 


( 3 ) 


A compact binary coalescence (CBG) source is speci¬ 
fied by a vector of extrinsic parameters describing its po¬ 
sition and orientation and intrinsic parameters describing 
the physical properties of the binary components^: 


e = 


a 

right ascension 

6 

declination 

r 

distance 

t® 

arrival time at geocenter 

t 

inclination angle 

V’ 

polarization angle 

4^c 

coalescence phase 

mi 

first component’s mass 'j 

1712 

second component’s mass 1 

Si 

first component’s spin [ 

S2 

second component’s spin J 


extrinsic 


6/„ 


intrinsic 

parameters. 


( 4 ) 

Assuming a non-precessing circular orbit, we can write 
the GW signal received by any detector as a linear com¬ 
bination of two basis waveforms, Hq and Ht ^/2 [45]. Hq 
and H ^/2 c^re approximately “in quadrature” in the same 
sense as the cosine and sine functions, being orthogo¬ 
nal and out of phase by 7r/2 at all frequencies. If Hq 
and Ht ^/2 are Fourier transforms of real functions, then 
Hq{u}) = Ho{—u}) and H^/ 2 {^) = 
write 


H^/2H=Ho{u;) 


—i if w > 0 
i if w < 0 


( 5 ) 


This list of parameters involves some simplifying assumptions. 
Eccentricity is omitted; although it may play a major role in the 
evolution and waveforms of rare close binaries formed by dynam¬ 
ical capture 41], BNS systems formed by binary stellar evolu¬ 
tion should almost always circularize due to tidal interaction [42] 
and later GW emission [13] long before the inspiral enters LIGO’s 
frequency range of ~ 10-1000 kHz. Tidal deformabilities of the 
NSs are omitted because the signal imprinted by the companions’ 
material properties is so small that it will only be detectable by 
an Einstein Telescope-class GW observatory [44]. Furthermore, 
in GW detection efforts, especially those focused on BNS sys¬ 
tems, the component spins Si and S 2 are often assumed to be 
nonprecessing and aligned with the system’s total angular mo¬ 
mentum and condensed to a single scalar parameter y, or even 
neglected entirely: Si = S 2 = 0. 


For brevity, we define H = Hq and write all subseqnent 
equations in terms of the H basis vector alone. Then, 
we can write the signal model in a way that isolates all 
dependence on the extrinsic parameters, into a few 
coefficients and all dependence on the intrinsic parame¬ 
ters, ^in, into the basis waveform, by taking the Fourier 
transform of Eq. (2.8) of Ref. [45], 


X,(uj-e) = 

r 

i (1 -I- cos^ 6) 3? {C} — i (cos i) 3 {O 


H{uj; 0i„) 


( 6 ) 


for w > 0, where 


C —e {F^^i{a,S,t^) + iFx^i{a,S,t^)). (7) 


The quantities and Fxy are the dimensionless detec¬ 
tor antenna factors, defined such that 0 < < 

1. They depend on the orientation of detector i as well 
as the sky location and sidereal time of the event and 
are presented in Ref. [46]. In a coordinate system with 
the X and y axes aligned with the arms of a detector, the 
antenna pattern is given in spherical polar coordinates as 


F_^ = — ^(1 + cos^ 9) cos2(?!). 

(8) 

Fx = — cos 9 sin 2(f>. 

( 9 ) 


The unit vector represents the position of detector i in 
units of light travel time.^ The vector n is the direction of 
the source. The negative sign in the dot product —di-n is 
present because the direction of travel of the GW signal 
is opposite to that of its sky location. The quantity ri^i 
is a fiducial distance at which detector i would register 
S/N=l for an optimally oriented binary (face-on, and in 
a direction perpendicular to the interferometer’s arms): 


ri^i = l/cTj, 



Siiw) 


( 10 ) 


More succinctly, we can write the signal received by 
detector i in terms of observable extrinsic parameters 
= {p^ , 7 i,Ti), the amplitude pi, phase 7 i, and time 
delay on arrival at detector i: 


Xi(a;;0i,0i„) 

= W(c.;p„7„r„ei„) = (11) 

The prevailing technique for detection of GWs from 
CBCs is to realize a maximum likelihood (ML) estima¬ 
tor (MLE) from the likelihood in Eq. (3) and the signal 


^ When considering transient GW sources such as those that we 
are concerned with in this article, the origin of the coordinate 
system is usually taken to be the geocenter. For long-duration 
signals such as those from statically deformed neutron stars, the 
solar system barycenter is a more natural choice. 
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model in Eq. (11). Concretely, this results in a bank of 
matched filters, or cross-correlations between the incom¬ 
ing data stream and a collection of template waveforms, 


2i(ri; 0in) 


1 

^z(^in) 



Si{uj) 


du!. (12) 


The ML point estimates of the signal parameters, 
MLE(y) = = {{/5i,7j,fi}-,0i„}, are given 

by 

= argmaxY ^\zi (ri;0in)|^ , (13) 

Sindh}. T 



7i = argzj . (15) 


A detection candidate consists of {{pi, Yi Ali > ®in}- 
There are various ways to characterize the significance of 
a detection candidate. In Gaussian noise, the maximum 
likelihood for the network is obtained by maximizing the 
network S/N, pnet, 


Pnet = maxy^ l•^^(®)l^ 
i 



(16) 


this, therefore, is the simplest useful candidate ranking 
statistic. 


A. Uncertainty and the Fisher matrix 


We can predict the uncertainty in the detection 
pipeline’s ML estimates using the CRLB. The CRLB 
has been widely applied in GW data analysis to esti¬ 
mate parameter estimation uncertainty [3, 17, 18, 47- 
49].® As we noted, there are significant caveats to the 
GRLB at low or moderate S/N [35-38]. However, here 
we will be concerned more with gaining intuition from the 
block structure of the Fisher matrix than its numerical 
value. Furthermore, the Fisher matrix in its own right— 
independent of its suitability to describe the parameter 
covariance—is a well-defined property of any likelihood 
function, and we will exploit it as such in Sec. III. 

We will momentarily consider the likelihood for a single 
detector: 


'C(L/;pj,7i,Tj,0i„) 


(x exp 


2 Jo 


duj 


(17) 


with Ai(a;;pj,7i,Tj,0i„) given by Eq. (11). 

The Fisher information matrix for a measurement y 
described by the unknown parameter vector 6 is the con¬ 
ditional expectation value 


= E 


/ d\ogCiYf,e) \ 

I d9, J 


dlog£(Y ,;0)\ 

dOk J 



(18) 


The Fisher matrix describes how strongly the likeli¬ 
hood depends, on average, on the parameters. Further¬ 
more, it provides an estimate of the mean-square error 
in the parameters. If 9 is an unbiased estimator of 0, 
9 = 9-9 is the measurement error, and E = E [99] is 
the covariance of the measurement error, then the GRLB 
says that S > 1“^, in the sense that (S — 1“^) is posi¬ 
tive semidefinite. 

Note that if log£ is twice differentiable in terms of 9, 
then the Fisher matrix can also be written in terms of 
second derivatives as 


Xjk = E 


' d^logCiY,;9) 
dOjOdk 


(19) 


When (as in our assumptions) the likelihood is Gaus¬ 
sian,^ Eq. (18) simplifies to 



( 20 ) 


This form is useful because it involves manipulating the 
signal Xiiuj) rather than the entire observation Y{uj). 
In terms of the fcth S/N-weighted moment of angular 
frequency. 



IM^)P 

Si{uj) 




Si{uj) 



( 21 ) 


the Fisher matrix for the signal in the Ah detector is 


/ Xe,,0i \ 

\^0i,0ia Pi^^Oi^,0iii ) 


( 22 ) 


The top-left block describes only the extrinsic parame¬ 
ters, and is given by 


Pi 

Pr ( 1 

X0ifii = 7i 0 

n \ 0 


li n 

0 0 \ 

Pj^ -Pi^i 

-pi'^Ldi p^'^uj'^i ) 


(23) 


(This is equivalent to an expression given in Ref. [25].) 
The bottom row and right column of Eq. (22) describe 


® The Fisher matrix is also used in construction of CBC matched- 
filter banks. The common procedure is to place templates uni¬ 
formly according to the determinant of the signal space metric, 
which is the Fisher matrix. This is equivalent to uniformly sam¬ 
pling the Jeffreys prior. In practice, this is done either by con¬ 
structing a hexagonal lattice [oO] or sampling stochastically ['J- 
55]. 


^ This assumes that the merger occurs at a frequency outside the 
sensitive “bucket” of the detector’s noise PSD. There are addi¬ 
tional terms if the GW spectrum drops to zero within the sen¬ 
sitive bandwidth of the detector, as can be the case for neutron 
star-black hole mergers; see Ref. [56]. 
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the intrinsic parameters and how they are coupled to the 
extrinsic parameters. We show in Appendix A that we 
need not consider the intrinsic parameters at all if we are 
concerned only with sky localization. 

For our likelihood, the CRLB implies that 


parameters. In this case, we integrate away the nuisance 
parameters, forming the marginal posterior 


p(/3|y) = 


p(y|^, A)p(/3, A) 

p{y) 


d\ 


(27) 


/I _0 _0 \ 

I 0 Pi Pi ^z/^rnis.z I 

\ 0 Pi CUi/cUrnisy Pi /^rms,2 / 

_ _ (24) 

where Wrms.i^ = — uji^. This structure implies that 

errors in the S/N are uncorrelated with errors in time and 
phase and that there is a particular sum and difference 
of the times and phases that are measured independently 
(see Appendix B). 

Reading off the tt element of the covariance matrix 
reproduces the timing accuracy in Eq. (24) of Ref. [17], 

std(ri -Ti)>J (T-i)^^ = —(25) 

* ^rms,2 

Fairhurst [17] goes on to frame the characteristic posi¬ 
tion reconstruction accuracy of a GW detector network 
in terms of time delay triangulation, with the above for¬ 
mula describing the time of arrival uncertainty for each 
detector. In Appendix C, we show how to extend this for¬ 
malism to include the phases and amplitudes on arrival 
as well. 



II. BAYESIAN PROBABILITY AND 
PARAMETER ESTIMATION 


In the Bayesian framework, parameters are inferred 
from the data by forming the posterior distribution, 
p{6\y), which describes the probability of the parameters 
given the observations. Bayes’s rule relates the likelihood 
p{y\0) to the posterior p[9\y), 


p{0\y) 


p{y\e)p{e) 

p{y) 


(26) 


introducing the prior distribution p{9) which encapsu¬ 
lates previous information about the parameters (for ex¬ 
ample, arising from earlier observations or from known 
physical bounds on the parameters) and the evidence 
p{y) which can be thought of as a normalization factor 
or as describing the parsimoniousness of the model. 

The choice of prior is determined by one’s astrophysi- 
cal assumptions. During LIGO’s sixth science run when 
LIGO’s Bayesian CBG parameter estimation pipelines 
were first deployed, the prior was taken to be isotropic in 
the sky location and binary orientation and uniform in 
volume, arrival time, and the component masses [31]. 

In Bayesian inference, although it is often easy to write 
down the likelihood or even the full posterior in closed 
form, usually one is interested in only a subset /3 of all 
of the model’s parameters, the others A being nuisance 


with 9 — (/3, A). For instance, for the purpose of locating 
a GW source on the sky, all parameters but (a, 6) are 
nuisance parameters. 


III. BAYESTAR LIKELIHOOD 


For the purpose of rapid sky localization, we assume 
that we do not have access to the GW data Y itself and 
that our only contact with it is through the ML param¬ 
eter estimates {{pi,yi,fi}i Although this is a sig¬ 

nificant departure from conventional GW parameter es¬ 
timation techniques, we can still apply the full Bayesian 
machinery of Eq. (27) to compute a posterior distribution 
for the sky location. 

The relevant likelihood is now the probability of the 
ML estimates, conditioned upon the true parameter val¬ 
ues, and marginalized over all possible GW observations: 



oc 


J p{Y\9)p{9)dY. 


Y|MLE(Y)={{ei}i,ei„} 


(28) 


Although we may not be able to evaluate this equation 
directly, with some educated guesses we can create a like¬ 
lihood that has many properties in common with it. Any 
valid approximate likelihood must have the same Fisher 
matrix as shown in Eq. (22). It must also have the same 
limiting behavior: it should be periodic in the phase error 
7 i and go to zero as fi ±oo, pi —>■ 0, or pi —>■ oo. Addi¬ 
tionally, when fi = 0, the distribution of pf should reduce 
to a noncentral distribution with 2 degrees of freedom, 
centered about pi^, because the complex matched-filter 
time series Zi{t) is Gaussian (under the ideal assumption 
the GW strain time series is Gaussian). 

These conditions could be satisfied by realizing a mul¬ 
tivariate Gaussian distribution with covariance matrix 
Tj = and then replacing individual quadratic terms 
in the exponent of the form —0^/2 with cos0. 

A more natural way is to plug the signal model from 
Eq. (11) evaluated at the ML parameter estimates into 
the single-detector likelihood in Eq. (17): 


p (di 9^ := p (Yi{uj) = 9) 


oc exp 




i?(w;0i„) 


i{9in) 


Pi ^ij'yi—i^Ti) F7(a;, 9^^) 
a-i{9in) S^{uj) 


Si{uj) 

2 

duj 


■ (29) 


If we further assume that the intrinsic parameters are 
equal to their ML estimates, 0in = 0in, then this reduces 
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to what we call the autocorrelation likelihood, 




oc exp 


1 




(30) 


with 7 i = Ti = Ti — Ti, and the template’s auto¬ 

correlation function ai(t]9ij^) defined as 


2 

dw. (31) 

Some example autocorrelation functions and correspond¬ 
ing likelihoods are shown in Fig. 1. To assemble the joint 
likelihood for the whole network, we form the product 
of the autocorrelation likelihoods from the individual de¬ 
tectors: 


^in) • 


Hito-Ou 


p{{pi,li,n},\{p^,7i,n}^) 


(X exp 



i i 

(32) 


Using Eq. (19), we can compute the Fisher ma¬ 
trix elements for the autocorrelation likelihood given by 
Eq. (30), with the detector subscript suppressed, 


T — 1 

-^pp - -^7 

T — 0 

-i-'P'y — 

J — 0 

J^pr — yj-i 

^77 ~ ^ J 

ji-rr — P 

-L-yT - P 

where 


|a(t)|^ w{t\ p)dt^ 


-T 


3? [a* {t)d{t)] w{t; p)dt, 


l-T 

pT 


l-T 


[a* {t)d{t)] w{t; p)dt, 


(34) 

(35) 

(36) 


exp 


H^;p) = 


^ Ht)\^ 


^ |a(t)|^ 


h 


^|a(t)r 


exp 




^|a(Ol 




^|a(Ol 


dt' 

(37) 


In the following section, we discuss some key properties 
of the autocorrelation likelihood. 


A. Properties 


First, the autocorrelation likelihood has the elegant 
feature that if we were to replace the autocorrelation 
function with the S/N time series for the best-matching 
template, z{T-,9in), we would recover the likelihood for 
the full GW time series, evaluated at the ML estimate of 
the intrinsic parameters, viz.: 


exp 


i i 


(33) 


[We have omitted the term J \Yi{(jj)\^/S(ijj)duj, which 
takes the place of the earlier term and is only im¬ 
portant for normalization.] The numerical scheme that 
we will develop is thus equally applicable for rapid, 
coincidence-based localization, or as a fast extrinsic 
marginalization step for the full parameter estimation. 

Second, observe that at the true parameter values, 
6i = Oi, the logarithms of Eqs. (30) and (17) have the 
same Jacobian. This is because the derivatives of the 
autocorrelation function are 




with w” defined in Eq. (21). For example, the first few 
derivatives are 


d(0) = ioj, d(0) = — 


The notation denotes a modified Bessel function of 
the first kind. Matrix elements that are not listed have 
values that are implied by the symmetry of the Fisher 
matrix. Note that the minus signs are correct but a little 
confusing; despite them, > 0 and 1^,- < 0. The 

time integration limits [—T, T] correspond to a flat prior 
on arrival time or a time coincidence window between 
detectors. 

We can show that the weighting function w{t;p) ap¬ 
proaches a Dirac delta function as p —>■ oo, so that 
the Fisher matrix for the autocorrelation likelihood 
approaches the Fisher matrix for the full GW data, 
Eq. (23), as p —>■ oo. The Bessel functions asymptoti¬ 
cally approach 

/o(a:), Ii{x) —>■ . as a; —>■ oo. 

V27ra; 


For large p, the exponents of dominate Eq. (37), and 
we can write 


exp 


w{t; p) 


y|a(t)|^ 


exp 


f-T 




as p —)■ oo. 


dt' 


The Taylor expansion of |a(t)P is 


|a(t)p = l + 


1 

2 I dt^ 


a(t)|2 ]f + Oit^) 


t=oy 


— 1 — Wrms t + 0{t ). 


a(0) = 1 
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FIG. 1. The autocorrelation likelihood for a (1.4,1.4) M© binary as observed by four detector configurations: from top to 
bottom, the final sensitivity achieved by the LIGO Hanford, LIGO Livingston, and Virgo detectors in their “initial” configuration 
and the final Advanced LIGO design sensitivity. The left panels show the noise amplitude spectral densities. The middle panels 
show the absolute value of the autocorrelation function. The right panel shows the phase-marginalized autocorrelation likelihood 
for S/N=l, 2, 4, and 8. In the right panel, the time scale is normalized by the S/N so that one can see that as the S/N increases, 
a central parabola is approached (the logarithm of a Gaussian distribution with standard deviation given by the Fisher matrix). 


Substituting, we find that w(t; p) approaches a normal¬ 
ized Gaussian distribution: 


w{t; p) 


exp 

^ 2 2j.2 

2^ i 



1 2 2ur\2 

-^P Wrms (t ) 

dt' 


And finally, because the Dirac delta function may be de¬ 
fined as the limit of a Gaussian, w{t; p) —>■ 6{t) as p —>■ oo. 


the full signal model explicit. Define 

— P ’ ®77(p)j 

Trr = P^Uj'^ ■ firrCp), 

Tfyr — P ^ ' ^’^ri.p)- 

Now, the Qy® contain the integrals from Eqs. (34, 35, 36) 
and encode the departure of the autocorrelation likeli¬ 
hood from the likelihood of the full data at a low S/N. 
All of the Qij{p) are sigmoid-type functions that asymp¬ 
totically approach 1 as p —>■ oo (see Figs. 2 and 3). The 


We can now write the Fisher matrix for the autocor¬ 
relation likelihood in a way that makes a comparison to 


The Fish(er) factor. 
































FIG. 2. CRLB on root mean square timing uncertainty and phase error, using the likelihood for the full GW data [Eq. (17); 
dashed diagonal line] or the autocorrelation likelihood [Eq. (30); solid lines] with a selection of arrival time priors. 



FIG. 3. Ratio between Fisher matrix elements (solid: 
dashed: dotted: Qtt) for the autocorrelation likelihood 

and the full GW data. Golors correspond to different arrival 
time priors as in Fig. 2. 

transition S/N pcrit is largely the same for all three non¬ 
trivial matrix elements, and is determined by the time 
coincidence window T and the signal bandwidth Wrms- 

In the limit of large S/N, our interpretation is that the 
point estimates (/3, 7 , f) contain all of the information 
about the underlying extrinsic parameters. 

On the other hand, in the low S/N limit, the diminish¬ 
ing value of fiij(p) reflects the fact that some information 
is lost when the full data x are discarded. Concretely, as 
the prior interval T becomes large compared to l/pcurms, 


the ML estimator becomes more and more prone to pick¬ 
ing up spurious noise fluctuations far from the true sig¬ 
nal. Clearly, when the coincidence window T is kept as 
small as possible, more information is retained in the ML 
point estimates. Put another way, if T is small, then the 
transition S/N pcrit is also small and fainter signals be¬ 
come useful for parameter estimation. In this way, the 
BAYESTAR likelihood exhibits the threshold effect that 
is well known in communication and radar applications 
[57-59]. 

In the following sections, we describe our prior and our 
numerical schemes to integrate over nuisance parameters, 
which together amount to the BAYESTAR algorithm. 


IV. PRIOR AND PROBLEM SETUP 

The detection pipeline supplies a candidate, 
{{pi, 7 i, f/li, 0in}, and discretely sampled noise PSDs, 
Si{ujj), for all detectors. We compute the GW signal for 
a source with intrinsic parameters equal to the detection 
pipeline’s estimate, H{uj;9in)- Then, we find the S/N=l 
horizon distance ri i for each detector by numerically 
integrating Eq. (10). 

We have no explicit prior on the intrinsic parameters; 
in our analysis they are fixed at their ML estimates, 0in.® 


® As noted in footnote 6, the detection template bank is typically 
designed to uniformly sample the Jeffreys prior on the intrin¬ 
sic parameters. Due to the equivalence of marginalization and 
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The arrival time prior is connected to the origin of the 
detector coordinate system. Given the Earth-hxed coor¬ 
dinates of the detectors and the arrival times we 
compute their averages weighted by the timing uncer¬ 
tainty formula: 


n = 


(Pi^rms.z) 

E ~ 




') = 


2 (p2^rms,i) 

E ~ 


(p2^rms,i) 


V. MARGINAL POSTERIOR 


The marginal posterior as a function of the sky location 
is 


f{a,S) oc 


exp 



i i 

r'^d(l)c dr dt(^ d cos i dip. 


(38) 


Then, we subtract these means: 

nj^ni-(n), fit-fi-(f). 

In these coordinates, now relative to the weighted detec¬ 
tor array barycenter, the arrival time prior is uniform in 
—T < t < T, with T = max |ni|/c -I- 5 ms. 

The distance prior is a user-selected power of distance. 


To marginalize over the coalescence phase, we can 
write 7 i = 7 ( + 20c- Then, integrating over 0c and sup¬ 
pressing normalization factors, we get 


f{a,6) 


exp 








r'^dr dt^ d cos t dip. 


(39) 


p{r) oc 



if ^min < r < r’cnax 
otherwise, 


In the above equation, we need not distinguish between 
7 i and 7 ' because the likelihood is now invariant under 
arbitrary phase shifts of all of the detectors’ signals. 


where m = 2 for a prior that is uniform in volume and 
TO = — I for a prior that is uniform in the logarithm of the 
distance. If a distance prior is not specified, the default is 
uniform in volume out to the maximum S/N=4 horizon 
distance: 


TO — 2, rjnin — 0: 


I 

J’max = -maxri,i. 
4 i 


Finally, the prior is uniform in — 1 < cosi < I and 

0 < 0 < TT. 

We compute the autocorrelation function for each de¬ 
tector from t = 0 to t = T at intervals of At = I//s, 
where fs is the smallest power of 2 that is greater than 
or equal to the Nyquist rate. Because BNS signals typ¬ 
ically terminate at about 1500 Hz, a typical value for 
At is (4096 Hz)“^. We use a pruned fast Fourier trans¬ 
form (FFT) because for BNS systems, the GW signal re¬ 
mains in LIGO’s sensitive band for ~I00-I000 s, whereas 
T ~ 10 ms.i° 


maximization with respect to a parameter under a Gaussian dis¬ 
tribution, fixing the intrinsic parameters at their ML estimates 
is roughly equivalent to selecting the Jeffreys prior. 

See http: //www. f f tw. org/pruned.html for a discussion of meth¬ 
ods for computing the pruned FFT, the first K samples of an 
FFT of length N. 


A. Integral over angles and time 

The integrand is periodic in ip, so simple New- 
ton-Gotes quadrature over ip exhibits extremely rapid 
convergence (see Fig. 4). We therefore sample the poste¬ 
rior on a regular grid of ten points from 0 to tt. 

The integral over cost converges just as rapidly with 
Gauss-Legendre quadrature (see Fig. 4), so we use a 
ten-point Gauss-Legendre rule for integration over cos t. 

We sample regularly from —T to T at intervals of 
At. This is typically ~ 2(10 ms)(4096 Hz) « 80 samples. 
We use Gatmull-Rom cubic splines to interpolate the real 
and imaginary parts of the autocorrelation functions be¬ 
tween samples. 


B. Integral over distance 


The distance integral is now performed differently from 
what we initially described in Refs. [60, 61]; the method 
described in the present work is about an order of mag¬ 
nitude faster. We define pi = ujijr in order to absorb all 
of the distance-independent terms in the amplitudes into 
uji and then define 



I 


(40) 


6 = 




(41) 
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(a) Polarization angle 



number of function evaluations 
(b) Inclination angle 



FIG. 4. Relative error in the BAYESTAR integration scheme as a function of the number of Gaussian quadrature nodes. The 
two panels describe (a) the integral over the polarization angle ip (b) the integral over the inclination angle t. 


The innermost integral over distance r may then be writ¬ 
ten as 



(42) 


The coefficients and b are non-negative and indepen¬ 
dent of distance, p has a maximum value of 



(47) 


or, completing the square, 



r 1 

pT'raa.x. 

(P P\ 

_ 

2p^ 

exp 


/ exp 

-- 

lo 



L^o J 

T’min 

\r To J 


[ rro J 


(43) 


= exp 



(44) 


where 


The symbol /q denotes an exponentially scaled Bessel 
function. In the limit of large argument, /o(|a;|) 
exp(|x|)/-\/27r|x| [G2, 63]^^. The scaled Bessel function 
is useful for evaluation on a computer because it has a 
relatively small range (0,1] and varies slowly in propor¬ 
tion to 


To = 2p^/h 

lo{x) = exp(-|a;|)/o(a;). 


(45) 

(46) 11 


http://dlmf.nist.gov/10.40.El 
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FIG. 5. Partition of the parameter space of the distance in¬ 
tegral into three regions for (bi)cubic interpolation. 


the approximate likelihood exp(—(p/r — p/tq)^) is max¬ 
imized when r = rQ. The likelihood takes on a factor t] 
(say, r] = 0.01) of its maximum value when 


r = r± 


J_ ^ V-'^ogv \ 
ro P J 


(55) 


We have now identified up to five breakpoints that par¬ 
tition the distance integrand into up to four intervals with 
quantitatively distinct behavior. These intervals are de¬ 
picted in Fig. 6 with the distance increasing from left to 
right. There is a left-hand or small distance tail in which 
the integrand is small and monotonically increasing, a 
left- and right-hand side of the maximum likelihood peak, 
and a right-hand tail in which the integrand is small and 
monotonically decreasing. These breakpoints are 


^break — {r € 


^min 

r_ 

ro 

r+ 

^max 


■ ^niin ^ ^ 


(56) 


1. Parameter grid 

This integral is not particularly amenable to low-order 
Gaussian quadrature. However, luckily is a very well- 
behaved function of p and rg, so we evaluate it using a 
lookup table and bicubic interpolation. The lookup table 
is produced in logarithmic coordinates 

x = \ogp, 2 / = logro. (48) 

As shown in Fig. 5, the function basically consists of a 
plateau region in the upper-left half of the plane delim¬ 
ited by the lines y = x and x = logpoj with 


We use these breakpoints as initial subdivisions in an 
adaptive Gaussian quadrature algorithm^^. This func¬ 
tion estimates the integral over each subdivision and each 
interval’s contribution to the total error, then subdivides 
the interval that contributes the most to the error. Sub¬ 
divisions continue until a fixed total fractional error is 
reached. In this way, most integrand evaluations are ex¬ 
pended on the most important distance interval, whether 
that happens to be the tails (when the posterior is dom¬ 
inated by the prior) or the peak (when the posterior is 
dominated by the observations). 

3. Interpolation 



^max 


if m > 0 
if m < 0. 


(49) 


We tabulate on a 400 x 400 regular grid spanning the 
range 


Xo = logmin(po,Pmax) 

(50) 

Xmin = Xo - (1 + ^/2)a 

(51) 

Xmax ~ logPnrax 

(52) 

ymin — 2Xo Xniax 

(53) 

2/max — Xo H“ OL 

(54) 


where a = 4 is a constant parameter that determines the 
extent of the grid. 


The interpolant is evaluated slightly differently de¬ 
pending on which of the three regions marked I, II, and 
III in Fig. 5 contains the point of interest. In region 
I, we use bicubic interpolation of logC/ in x and y. In 
region II, we use univariate cubic interpolation of logC/ 
in X, with the sample points taken from the horizontal 
boundary between regions I and II. In region III, we use 
univariate cubic interpolation of log^ in u = (a: — 2 /)/ 2 , 
with the sample points taken from the downward diag¬ 
onal boundary between regions I and III. Finally, the 
distance integral ^ is obtained by multiplying the inter¬ 
polated value of by exp . For a 400 x 400 grid, 

the entire lookup table scheme is accurate to a relative 
error of about I0“® in ^ (see Fig. 7). 


2. Lookup table construction 

The lookup table for ^ is populated as follows. If we 
neglect both the Bessel function and the r™ prior, then 


For instance, GNU Scientific Library’s gsl_integrate_qagp func¬ 
tion, http://www.gnu.org/software/gsl/manual/html_node/ 

QAGP-adaptive-integration-with-known-singular-points. 
html. 
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V /-/ -/ / ' . . . 


FIG. 6. Illustration of initial snbdivisions for the distance 
integration scheme. The distance increases from left to right. 
In the color version, the left-hand tail, the left- and right-hand 
sides of the maximnm likelihood peak, and the right-hand tail, 
are colored cyan, red, green, and blue, respectively. 



FIG. 7. Relative error in the BAYESTAR distance integral 
interpolation scheme as a function of the size of the grid. 


VI. ADAPTIVE HEALPIX SAMPLING 

We have explained how we evaluate the marginal pos¬ 
terior at a given sky location. Now we must specify where 
we choose to evaluate it. 

Our sampling of the sky relies completely on 
HEALPix [64], a special data structure designed for 
all-sky maps. HEALPix divides the sky into equal-area 


pixels. There is a hierarchy of HEALPix resolutions. 
A HEALPix resolution may be designated by its order 
TV. The N = 0th order or base tiling has 12 pixels. 
At every successive order, each tile is subdivided into 
four new tiles. A resolution may also be referred to by 
the number of subdivisions along each side of the base 
tiles, Aside = 2^. There are Apix = 12Aside^ pixels at 
any given resolution. The HEALPix projection uniquely 
specifies the coordinates of the center of each pixel by 
providing a mapping from the resolution and pixel index 
(.^sideGpix) to right ascension and declination (a, <5). 

The BAYESTAR adaptive sampling process works as 
follows. We begin by evaluating the posterior probability 
density at the center of each of the Apix,o = 3072 pixels of 
an Aside ,0 = 16 HEALPix grid. At this resolution, each 
pixel has an area of 13.4 deg^. We then rank the pixels 
by contained probability (assuming constant probability 
density within a pixel) and subdivide the most proba¬ 
ble Apix,o/4 pixels into Apix,o new daughter pixels. We 
then evaluate the posterior again at the centers of the 
new daughter pixels, sort again, and repeat seven times. 
By the end of the last iteration, we have evaluated the 
posterior probability density a total of 8Apix,o times. On 
most subdivision steps, we descend one level deeper in 
HEALPix resolution. This process is illustrated in Fig. 8. 

The resulting map is a tree structure that describes 
a mesh of pixels with different resolutions. An exam¬ 
ple BAYESTAR subdivision is shown in Fig. 9. To con¬ 
vert this mesh into a Flexible Image Transport System 
(FITS) [65] image, we traverse the tree and flatten it into 
the highest resolution represented. The highest possible 
resolution is Aside = 2^^, with an area of « 10“^ deg^ 
per pixel. 

VII. PARALLELIZATION 

MCMC and similar stochastic schemes are typically 
very resistant to parallelization. However, BAYESTAR 
is completely deterministic and easily parallelizable be¬ 
cause each pixel can be evaluated independently from 
all of the others. BAYESTAR consists of nine compu¬ 
tationally intensive loops: the generation of the distance 
integral lookup table and the eight loops over pixels in 
the adaptive HEALPix sampling step. The iterations 
of each loop are distributed across multiple cores using 
OpenMP^"' compiler directives. In Sec. VHIE, we will 
show that BAYESTAR’s run time is almost perfectly pro¬ 
portional to the number of cores, demonstrating that the 
serial sections (the sorts between the adaptation steps) 
are a negligible contribution to the overall wall clock 
time. 


Although the resulting sky map contains Apjx 5 X 10® pixels, 
at most Ri 2 X 10^ pixels have distinct values. For the pur¬ 
pose of delivery to observers, therefore, the output is always 
gzip-compressed with a ratio of Ri 250 : 1. 
http;//openmp.org/. 
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2. Sort by probability and select top W/4 pixels 



3. Subdivide & replace with 
N new daughter pixels 


4. Sort by probability and 
select top N/4 pixels 

5. Subdivide & replace with 
N new daughter pixels 
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□ Q □ □ 
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6. Sort by probability and 
select top N/4 pixels 


Repeat 


FIG. 8. Illustration of the BAYESTAR adaptive HEALPix sampling scheme. 



FIG. 9. An example mnltiresolntion HEALPix mesh arising 
from the BAYESTAR sampling scheme (plotted in a cylindri¬ 
cal projection). This is event 18951 from Ref. [27]. 


VIII. CASE STUDY 

We have completed our description of the BAYESTAR 
algorithm. In Ref. [27], the authors presented a compre¬ 
hensive and astrophysically realistic sample of simulated 
BNS mergers. We focused on the first two planned Ad¬ 
vanced LIGO and Virgo observing runs as desribed in 
Ref. [3]. That work presented a catalog of 500 sky lo¬ 
calizations from BAYESTAR and LALINFERENCE and 
dealt with the quantitative position reconstruction accu¬ 
racy as well as the qualitative sky morphologies. In the 


present work, we will use the same data set but instead 
focus on demonstrating the correctness and performance 
of the BAYESTAR algorithm. 


A. Observing scenarios 

To review the assumptions made in Ref. [27], the two 
scenarios are: 

2015. —The first Advanced LIGO observing run, or 
“01,” scheduled to start in September 2015 and continue 
for three months. There are only two detectors partic¬ 
ipating in this run: LIGO Hanford (H) and LIGO Liv¬ 
ingston (L). Both detectors are expected to operate with 
a direction-averaged BNS merger range of 40-80 Mpc 
(though ongoing Advanced LIGO commissioning sug¬ 
gests that the higher end of this range will be achieved). 
As a result of having only two detectors, most localiza¬ 
tions are long, thin arcs a few degrees wide and tens to 
hundreds of degrees long. The median 90% credible area 
is about 600 deg^. 

2016. —The second observing run, “02,” with the two 
between Advanced LIGO detectors, upgraded to a BNS 
range of 80-120 Mpc, operated jointly with the newly 
commissioned Advanced Virgo detector (V), operating 
at a range of 20-60 Mpc. The run is envisioned as lasting 
for six months in 2016-2017. The detectors are assumed 
to have random and independent 80% duty cycles. Con¬ 
sequently, all three detectors (HLV) are in science mode 
about half of the time, with the remaining time divided 
roughly equally between each of the possible pairs (HL, 
HV, or LV) and one or fewer detectors (at least two GW 
facilities are required for a detection). Virgo’s range is as¬ 
sumed to be somewhat less than LIGO’s because its com¬ 
missioning time table is about a year behind. Although 
the simulated signals are generally too weak in Virgo to 
trigger the matched-filter pipeline and contribute to de¬ 
tection., even these subthreshold signals aid in position 
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reconstruction with LALINFERENCE by lifting degen¬ 
eracies. As a result, the median 90% credible area de¬ 
creases to about 200 deg^. 

All simulated sources have component masses dis¬ 
tributed uniformly between 1.2 and 1.6 Mq and ran¬ 
domly oriented spins with dimensionless magnitudes x = 
c|S|/Gm^ between -0.05 and -1-0.05. Sky positions and 
binary orientations are random and isotropic. Distances 
are drawn uniformly from reflecting a uniform 

source population (neglecting cosmological effects, which 
are small within the Advanced LIGO BNS range). 


B. Detection and localization 

The simulated waveforms were deposited in Gaussian 
noise that has been filtered to have the PSDs consistent 
with Ref. [3]. They were detected using the real-time 
matched-filter pipeline, GSTLALJNSPIRAL [66]. Can¬ 
didates with estimated false alarm rates (FARs) less than 
10“^yr“^ were considered to be “detections.” Because 
using Gaussian noise results in lower FARs than would 
be calculated in realistically glitchy detector noise, we 
imposed an additional detection threshold on the net¬ 
work S/N, p > 12, which has been found to corre¬ 
spond to a comparable FAR in the initial LIGO runs.^® 
Localizations for the detections were generated with 
BAYESTAR as well as the functionally equivalent and in¬ 
terchangeable LALINFERENCE_MCMC, LALINFER- 
ENCE_NEST, and LALINFERENCE_BAMBI samplers 
(collectively referred to as LALINFERENCE). 


C. Areas 

We measured sky localization areas for each event as 
follows. First, we ranked the HEALPix pixels by de¬ 
scending posterior probability. Then, we computed the 
cumulative sum of the pixels in that order. Finally, we 
searched for the index of the pixel of which the cumula¬ 
tive sum was equal to a given value: for example, 0.9 if we 
are interested in the 90% credible area. That pixel index 
times the area per pixel is the area of the smallest region 
of the specihed credible level. This area can be thought 
of as measuring the precision of the sky localization: it 
is a measure of the scale of the posterior distribution. 

We can construct a second measure, called the searched 
area, as the smallest such constructed area that contains 
the true location of the source. A telescope with a FOV 
that is small compared to the characteristic scale of the 
posterior would intercept the true location of the source 
after covering the searched area. This measure is mainly 


See Ref. [61] for an analysis of the effect of glitchy noise on de¬ 
tection and parameter estimation. 


useful because it measures the accuracy of the sky local¬ 
ization independently of the precision. In other words, it 
treats the sky map as merely a ranking statistic. 

Histograms of the 90% credible area and the searched 
area are shown in Fig. 10, broken down by observing sce¬ 
nario (2015 or 2016) and detector network (HL, HV, LV, 
or HLV). Note that there are no statistically significant 
differences in areas between BAYESTAR and LALIN- 
FERENGE, with the exception in the 2016/HLV configu¬ 
ration, for which some LALINFERENGE sky maps span 
about an order of magnitude less area than BAYESTAR. 
If we consider only events for which all three detec¬ 
tors contained a signal that was loud enough to trigger 
the matched-filter pipeline, the difference becomes much 
smaller and insignificant within 95% error bars. This is 
because if the signal is too weak to trigger the detec¬ 
tion pipeline in one of the detectors, then BAYESTAR 
receives no information about that detector. This issue 
does not occur in the two-detector configurations (HL, 
HV, or LV) because two or more triggers are required to 
report a detection candidate. 

This is a significant issue for the 2016 configura¬ 
tion, because the most accurate localizations are pos¬ 
sible when all three detectors are operating. However, 
there may be a simple remedy. As we noted in Sec. HI A, 
the BAYESTAR likelihood can be modihed to use, in¬ 
stead of the times, phases, and amplitudes on arrival, 
the full complex matched-filter time series from all de¬ 
tectors. The detection pipeline, GSTLALJNSPIRAL, 
would have to be modified to save and transmit a small 
interval of the complex S/N time series (perhaps a few 
tens of milliseconds) around the time of each detec¬ 
tion candidate. In addition to supplying the missing 
information for subthreshold signals, this would make 
BAYESTAR mathematically equivalent to the LALIN¬ 
FERENGE analysis, but with the intrinsic parameters 
fixed to their maximum-likelihood values. This idea will 
be pursued in future work. 


D. Self-consistency 

As we observed above, the area of a given credible 
region describes the precision of the sky localization, 
whereas the searched area describes the accuracy. How¬ 
ever, self-consistency requires that the two are related. 
For example, we should find that on average 90% of 
events have their true locations contained within their 
respective 90% credible regions. More generally, if we 
make a cumulative histogram of the credible levels corre¬ 
sponding to the searched areas of all of the events, then 
we should obtain a diagonal line (with small deviations 
due to finite sample size). This test, popularized for GW 
data analysis by Ref. [26] , is a necessary but not sufficient 
condition for the validity of any Bayesian parameter es¬ 
timation scheme. 

It is already well established that LALINFERENGE 
localizations satisfy the P-P plot test when deployed 
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area of 90% confidence region (deg^) searched area (deg^) 


FIG. 10. Cumulative histograms of sky area, broken down by observing run and detector network. The plots in the left column 
show the 90% credible area and the plots in the right column show the searched area. From top bottom, the rows refer to 
the following observing scenarios/network configurations: 2015/HL, 2016/HL, 2016/HV, 2016/LV, and 2016/HLV. The shaded 
regions represent the 95% confidence bounds. The magenta lines represent BAYESTAR and the blue lines LALINFERENCE. 
Where relevant, dotted lines show all events in the given network configuration and solid lines show only events for which the 
matched-filter pipeline triggered on all operating detectors. Note that statistically significant differences in areas between the 
BAYESTAR and LALINFERENCE localizations occur only for events that were below the detection threshold in one or more 
detectors. 
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with accurate templates and reasonable priors. We found 
at first that BAYESTAR’s P-P plots tended to sag below 
the diagonal, indicating that though the accuracy (i.e., 
searched area) was comparable to LALINFERENCE, the 
precision was overstated, with confidence intervals that 
were only about 70% of the correct area. This was rec¬ 
tified by prescaling the S/Ns from GSTLALJNSPIRAL 
by a factor of 0.83 prior to running BAYESTAR. This 
correction factor suggests that, for example, a S/N=10 
trigger from GSTLALJNSPIRAL has the effective infor¬ 
mation content of a S/N=8.3 signal. The missing infor¬ 
mation may be due to losses from the discreteness of the 
template bank, from the singular value decomposition, 
from mismatch between the matched-filter templates and 
the simulated signals, from the small but nonzero corre¬ 
lations between masses and intrinsic parameters, or from 
elsewhere within the detection pipeline. The correction 
is hard coded into the rapid localization. With it, the 
P-P plots are diagonalized without negatively affecting 
the searched area (see Fig. 11). 


E. Run time 

Since BAYESTAR is designed as one of the final 
steps in the real-time BNS search, it is important 
to characterize how long it takes to calculate a sky 
map. We compiled BAYESTAR with the Intel C Gom- 
piler (ice) at the highest architecture-specific optimiza¬ 
tion setting (-ipo -03 -xhost). We timed it un¬ 
der Scientific Linux 6.1 on a Supermicro SuperServer 
6028TP-HTTR system with dual 8-core Intel Xeon 
E5-2630 v3 GPUs clocked at 2.40 GHz, capable of execut¬ 
ing 32 threads simultaneously (with hyperthreading). In 
Fig. 12, we show how long it took to calculate a localiza¬ 
tion with BAYESTAR as the number of OpenMP threads 
was varied from 1 to 32. This is a violin plot, a smoothed 
vertical histogram. The magenta regions show run times 
for a two-detector network (HL) modeled on the first 
scheduled Advanced LIGO observing run in 2015, and 
the blue regions show run times for a three-detector net¬ 
work (HLV) based on the second planned observing run 
in 2016. These are the two observing scenarios that are 
discussed in Ref. [27]. 

Several features are apparent. First, at any number of 
threads, the two configurations have similar run times, 
although the 2016 events contain a subpopulation of out¬ 
liers that take about 2.5 times as long as the 2015 events. 
These are probably due to taking one of the more ex¬ 
pensive code branches in the distance integral interpo¬ 
lation. Second, the run times decrease proportionally 
to the number of threads. Based on experiences run¬ 
ning BAYESTAR on the 32-core (64 threads with hyper¬ 
threading) cluster login machine, we expect the almost 
ideal parallel speedup to continue on machines with even 
more processors. 

With just one thread, the BAYESTAR analysis takes 
76-356 s, already orders of magnitude faster than the full 


parameter estimation. With 32 threads, BAYESTAR 
takes just 4-13 s. In practice, BAYESTAR’s data 
handling (reading the detectors’ PSDs, communicat¬ 
ing with the GW candidate database, writing FITS 
files) takes an additional 15 s or so, though this over¬ 
head could be reduced by parallelizing many of these 
steps. The overall latency is comparable to the other 
stages (data aggregation, trigger generation, alert dis¬ 
tribution) in the real-time BNS analysis; therefore, 
any significant further speedup would require significant 
changes through Advanced LIGO computing and infras¬ 
tructure. The 32-thread configuration is representative 
of how BAYESTAR might be deployed in early Ad¬ 
vanced LIGO.^® For comparison, sky localization with 
LALINFERENGE takes about 100h [61]. 

Note that this benchmark shows BAYESTAR to be 
an order of magnitude faster than what was reported in 
Refs. [60, 61] due to the changes in the distance integra¬ 
tion scheme that we noted in Sec. VB. 


IX. FUTURE WORK 

One immediately pressing direction for future work is 
to address the issue of subthreshold signals, as this will 
be a major issue when Advanced Virgo comes online in 
2016-2017. Using the full S/N time series in place of the 
autocorrelation function seems like a promising avenue; 
implementing this requires some infrastructure changes 
to both the matched-filter pipeline and BAYESTAR. 
Along these lines, we also refer the reader to Ref. [67] 
for a similar, non-MGMG approach to the rapid explo¬ 
ration of the full parameter space. 

A more open-ended question is how to account for spin 
precession. The simulations in Ref. [27] and in this paper 
featured extremely modest spins of x < 0.05, consistent 
with the fastest known pulsars in binaries [68, 69]. The 
signals were detected using a template bank that lacked 
spins entirely. Reference [70] shows that using nonspin¬ 
ning BNS templates for parameter estimation has negli¬ 
gible impact on sky localization. However, if one or both 
companions are spinning as fast as a millisecond pulsar, 
X ~ 0.4 [71], or even near breakup, x ~ 0.7, then the or¬ 
bital plane may precess; in this case, spins can no longer 
be neglected for detection [69] and may also be important 
for parameter estimation. Since spinning BNS searches 
are still an active area of development, BAYESTAR’s sky 
localization accuracy in this regime should be reexamined 
in the future. 

Although the response time of BAYESTAR has been 
driven by the anticipated time scales for kilonova and af¬ 
terglow emission, a recurring question is whether there 


BAYESTAR has been successfully ported to the Intel’s Many 
Integrated Core architecture and has been tested in a 500 thread 
configuration on a system with dual Intel Xeon Phi 5110P co¬ 
processors. 
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searched posterior mass searched posterior mass 


FIG. 11. P-P plots for BAYESTAR and LALINFERENCE localizations in the 2015 and 2016 configurations. The gray lozenge 
around the diagonal is a target 95% confidence band derived from a binomial distribution. 



FIG. 12. Violin plot of BAYESTAR run times as the number 
of OpeuMP threads is varied from 1 to 32. The 2015 scenario 
is shown in red and the 2016 scenario in blue. 


is any detectable EM signal in the seconds before, dur¬ 
ing, and after the merger itself. Since the GW inspi¬ 
ral signal is in principal detectable for up to hundreds 
of seconds before merger, one could imagine position¬ 
ing rapidly slewing instruments to search for any prompt 
emission. This concept was explored by the authors [66] . 

On the topic of very low-latency localization, we also 
recommend Chen & Holz [72] , who propose a rapid local¬ 


ization scheme that is similar to ours, but even faster be¬ 
cause it makes some additional compromises: their like¬ 
lihood is strictly Gaussian, so one more marginalization 
integral (the integral over arrival time) can be performed 
analytically. 


X. CONCLUSION 

We have presented a novel, fast, accurate, Bayesian al¬ 
gorithm for inferring the sky locations of compact binary 
merger sources that may soon be detected by advanced 
ground-based GW detectors. For BNS systems with 
small spins, we have shown that BAYESTAR produces 
sky maps that are as accurate as the full MCMC param¬ 
eter estimation code but can do so within ~10s after a 
detection. Still faster response times should be possible 
in the future (if warranted) by deploying BAYESTAR on 
machines with more cores or by distributing BAYESTAR 
across multiple computers. 

Following a BNS merger, the signal will be detected 
by the matched-Hlter pipeline within tens of seconds; 
an alert containing the time and estimated significance 
of the event can be distributed almost immediately (al¬ 
though a human validation stage that may be present 
at the beginning of the first observing run may intro¬ 
duce some additional latency). The localization from 
BAYESTAR will be available tens of seconds to a minute 
later. Finally, the refined localization and the detailed es¬ 
timates of masses and spins from LALINFERENCE will 
be distributed hours to days later. 

Relevant time scales for possible EM counterparts to 
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GW signals include seconds (the prompt GRB signa¬ 
ture), hundreds of seconds (extended emission and X-ray 
plateaus that are observed for some short GRBs), min¬ 
utes to hours (X-ray and optical afterglow), hours to days 
(the kilonova or the blue flashes associated with unbound 
ejecta or disk winds), and days to years (the radio after¬ 
glow). For the first time, we are able to provide accurate 
localizations before the peak of any of these EM signa¬ 
tures (except for the short GRB or any premerger signal). 
Even for components like the kilonova that should peak 
within hours to days, the availability of the localizations 
within seconds might provide a window of several hours 
to obtain tiled images of the area before the EM emis¬ 
sion begins. These could be used as reference images, 
crucial at optical wavelengths for establishing the rapid 
rise and quickly distinguishing from slower background 
transients. 
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to multiplicative factors, 0151 ( 0 ;) = 02 ^ 2 ( 0 ;) = ••• = 
c„5„(o>) = 5 ( 0 ), then we can show that the errors in the 
intrinsic parameters (masses) are not correlated with sky 
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averages are correlated with the intrinsic parameters, but 
neither are correlated with the differences. Since only the 
differences inform the sky location, this gives us license 
to neglect uncertainty in masses when we are computing 
the sky resolution. 

This is easiest to see if we make the temporary change 
of variables p —)■ = logp. This allows us to factor 
out the S/N dependence from the single-detector Fisher 
matrix. The extrinsic part becomes 
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Due to our assumption that the detectors’ PSDs are pro¬ 
portional to each other, the noise moments are the same 
for all detectors, = uj^. Then, we can write the 
single-detector Fisher matrix as 

(A2) 


with the top-left block A comprising the extrinsic param¬ 
eters and the bottom-right block C the intrinsic param¬ 
eters. 

Information is additive, so the Fisher matrix for the 
whole detector network is 


/ Pi^A 0 
0 P2^A 

0 0 


0 pi^B \ 

P^B 

0 : 

Pn'^A pn'^B 

PN^B" p^et^C J 


(A3) 


https://ligo-vcs.phys.uwm.edu/cgit/lalsuite/tree/ 
lalinference. 

http://www.Isc-group.phys.uwm.edu/daswg/projacts/ 

lalsuite.html. 

http://www.astropy.org. 

http://healpix.sourceforge.net. 


Now we introduce the change of variables that sacrihces 
the A^th detector’s extrinsic parameters for the network 


(A4) 


averages. 
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and replaces the first — 1 detectors’ extrinsic parame¬ 
ters with differences, 


7i -t ^ 7 * = 7 i - 7 / for * = 1 ,..., iV - 1. (A5) 

Ti 6n = Ti-T j 


( 1 0 
0 1 


0 1 0 \ 

0 1 0 


J = 


0 


0 


-Pl -P2 
Pn"^ Pn^ 

\ 0 0 


1 


— PW-1^ 

Pn^ 

0 


1 

1 

0 



(A6) 


The Jacobian matrix that describes this change of vari- 


The transformed network Fisher matrix is block diagonal, 
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(A7) 


The top-left block contains iV — 1 relative amplitudes, 
phases, and times on arrival, all potentially correlated 
with each other. The bottom-right block contains the 
average amplitudes, phases, and times, as well as the 
masses. The averages and the masses are correlated with 
each other, but are not correlated with the differences. 
Because only the differences are informative for sky lo¬ 
calization, we drop the intrinsic parameters from the rest 
of the Fisher matrix calculations in the Appendix. 


Appendix B: Interpretation of phase and time errors 


;^(7 ± 7 t-), diagonalizes the Fisher matrix: 
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Thus, in the appropriate time units, the sum and differ¬ 
ence of the phase and time of the signal are measured 
independently. 


The Fisher matrix in Eq. (23) is block diagonal, which 
implies that estimation errors in the signal amplitude p 
are uncorrelated with the phase 7 and time r. A sequence 
of two changes of variables lends some physical interpre¬ 
tation to the nature of the coupled estimation errors in 
7 and r. 

First, we put the phase and time on the same footing 
by measuring the time in units of 1/ y/uff with a change 
of variables from r to 7,- = y/uffr: 
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The second change of variables, from 7 and 71- to 7 ± = 


Appendix C: Position resolution 

Finally, we will calculate the position resolution of 
a network of GW detectors. We could launch directly 
into computing derivatives of the full signal model from 
Eq. (6) with respect to all of the parameters, but this 
would result in a very complicated expression. Fortu¬ 
nately, we can take two shortcuts. First, since we showed 
in Appendix A that the intrinsic parameters are corre¬ 
lated only with an overall nuisance average arrival time, 
amplitude, and phase, we need not consider the deriva¬ 
tives with respect to mass at all. Second, we can reuse the 
extrinsic part of the single-detector Fisher matrix from 
Eq. (23) by computing the much simpler Jacobian ma¬ 
trix to transform from the time, amplitude, and phase on 
arrival to the parameters of interest. 

We begin by transforming the single-detector Fisher 
matrix from a polar to a rectangular representation of 
the complex amplitude given in Eqs. (14, 13), Pi,7i —>■ 
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^[zi] = piCOS 7 i, 9 [zi] = Pi sin 7 ^: 
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Consider a source in a “standard” orientation with the 
direction of propagation along the +z axis, such that 


the GW polarization tensor may be written in Cartesian 
coordinates as 

1 / 5(1 + C0S^(.) iCOSL 0\ 

i? =I icost —l(l + cos^t) 0 I . (C2) 

^ \ 0 0 0 / 

Now introduce a rotation matrix R that actively trans¬ 
forms this source to the Earth-relative polar coordinates 
0 , (j), and gives the source a polarization angle ip (adopt¬ 
ing temporarily the notation ce = cosO, se = sin 0 ): 


/ -S 0 0 \ / Ce 0 se\ / -s^ 0 \ / -1 0 0 \ 

R = R,{(l>)Ry{9)R,{iP)Ry{TT) = s^ -C0 0 0 1 0 s,^ -c,^ 0 0 1 0 . (C3) 

\o 0 I) \ -sg 0 eg) \ 0 0 i/yoo-iy 


r 


(The rightmost rotation reverses the propagation direc¬ 
tion so that the wave is traveling from the sky position 
0, (p.) With the (symmetric) detector response tensor 
we can write the received amplitude and arrival time as 


Zi = ri^i Tr 


D,RHR 


Ti = tm+ dj i?k. 


(C4) 

(C5) 


Equivalently, we can absorb the rotation R and the hori¬ 
zon distance ri^i into the polarization tensor, detector 
response tensors, and positions, 

H^H' = R,{-iP) Ry{n) H Ryin)" R,{-iP)\ (C 6 ) 

= n., Ry{ef R,{<j>) A Rz{4>) Ry{0), (C7) 

d, ^ d', = Ryi9f RM^ d,, (C 8 ) 

k'= (0,0,-l). (C9) 

Now the model becomes 

f h+ /lx 0 \ 

A = /lx -/i+ 0 , (CIO) 

\ 0 0 0 / 

z, = Tr [D[H'] = hM, - D[,) + (Cll) 

Ti =t(B + (di) • k, (C12) 


We insert an infinitesimal rotation SR to perturb the 
source’s orientation from the true value: 


Zi = Tr 


D[{SR)H'{SR) 


n=t^ + (d') m)^- 


(C15) 

(C16) 


We only need a first-order expression for SR^ because we 
will be taking products of first derivatives of it^^: 


1 0 59 

SR = \ 0 1 Sp 

-59 -5(j) 1 


(C17) 


where 


h+ = 


(1 -I- cos^ t) cos + i cos t sin 2tp 


/lx = 


-( 1 - 1 - cos^ i) sin 2 i/; — i cos i cos 2 ^ 


(C13) We construct a Jacobian matrix Ji to transform from 
the single-detector observables (3fi[zi], r^) to the po¬ 

sition perturbations, polarization components, and geo- 
centered arrival time 
(C14) (50, g?[/i+], 3[/i+], 3?[/ix], 9[/ix], A): 


21 


Caution: the angles 50 and 5(1) represent displacements in two 
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We transform and sum the information from each detec¬ 
tor, 

Inet = ^ (C19) 

i 


1. Marginalization over nnisance parameters 

To extract an area from the Fisher matrix, we must 
first marginalize or discard the nuisance parameters. 
Note that marginalizing parameters of a multivariate 
Gaussian distribution amounts to simply dropping the 
relevant entries in the mean vector and covariance ma¬ 
trix. Since the information is the inverse of the covariance 
matrix, we need to invert the Fisher matrix, drop all but 
the first two rows and columns, and then invert again. 

This procedure has a shortcut called the Schur com¬ 
plement (see, for example. Press et al. [74]). Consider a 
partitioned square matrix M and its inverse: 

"=(cd)' "■■'(as)- 

If A and B are square matrices, then the upper-left block 
of the inverse can be written as 

A-^=A-BD-^C. (C21) 

If we partition the Inet similarly, the A block consists 
of the first two rows and columns and D is the lower 
right block that describes all other parameters. Because 
the Fisher matrix is symmetric, the off-diagonal blocks 
satisfy C = B . Then the Schur complement 

Tmarg = A- BD-^B (C22) 

gives us the information matrix marginalized over all pa¬ 
rameters but 59 and 54i. 


2. Spatial interpretation 

How do we extract the dimensions of the localization 
from the Fisher matrix? If there are N < 2 detectors, 


orthogonal directions, but are not necessarily simply related to 
6 and 0. 


then the Fisher matrix must be degenerate, because there 
are 3iV measurements and seven parameters: 


' 59 ' 

5<t) 

3fi[/t+] 

< 3[/i+] > 

3[/lx] 


7 parameters 


3?[zi] 'I 

5 hi > 


X N = 3N observables. 


Therefore, for N = 2 detectors, the marginalized Fisher 
matrix Imarg is singular. Its only nonzero eigenvalue A 
describes the width of an annulus on the sky. The width 
of the annulus that contains probability p is given by 

Lp = 2 V 2 erf(p)/v^. (C23) 

The prefactor 2y/2/ erf“^(p) is the central interval of a 
normal distribution that contains a probability p, and is 
« 3.3 for p = 0.9. Caution: for two-detector networks, 
priors play an important role in practical parameter es¬ 
timation and areas can be much smaller than one would 
predict from the Fisher matrix. 

For N > 3 detectors, the parameters are overcon¬ 
strained by the data, and the Fisher matrix describes 
the dimensions of an ellipse. Within a circle of radius r 
centered on the origin, the enclosed probability p is 

p 27 r pr -I 

p= / —e~^^‘^sdsd(j)=l — e~^^‘^. (C24) 

Jo Jo 27r 

Therefore the radius r of the circle that contains a prob¬ 
ability p is 


r = ^/-2\r^{l-p). (C25) 

Suppose that the eigenvalues of the Fisher matrix are 
Al and A2. This describes a la uncertainty ellipse that 
has major and minor radii and area 

Ai„ = TT/\/\iX 2 = tt/V det T. Then, the area of an el¬ 
lipse containing probability p is 

Ap = —27rln(I — p)/VdetI, (C26) 

or, more memorably for the 90th percentile, Hg g = 
27rln(I0)/-\/detI. 
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3. Outline of calculation 

Using the above derivation, we arrive at a prediction 
for the sky resolution of a GW detector network. We took 
some shortcuts that allowed us to avoid directly evaluat¬ 
ing the complicated derivatives of the signal itself with 
respect to the sky location. As a result, the expressions 
involved in each step are simple enough to be manually 
entered into a computer program. However, because the 
procedure involves several steps, we outline it once again 
below. 

1. Compute, for each detector, the horizon distance 

ri^i, the angular frequency moments Ui and 
and from Eqs. (C13, C14). (These can be 

reused for multiple source positions as long as the 
masses and the detector noise PSDs are the same.) 

2. For a given (/>, compute the complex received 


amplitude Zi from Eqs. (CIO, Cll), the extrinsic 
Fisher matrix from Eq. (23), and the Jacobian from 
Eq. (CIS). 

3. Sum the information from all detectors using 
Eq. C19. 

4. Compute the marginalized Fisher matrix from the 
Schur complement using Eq. (C22). 

5. If there are two detectors, find the width Lp of the 
ring describing the pth quantile using Eq. (C23). If 
there are three or more detectors, find the area Ap 
of the pth quantile using Eq. (C26). 

6 . [Optionally, convert from (ste)radians to (square) 
degrees.] 

See the code listing in Appendix A.6 of Ref. [GO]. 
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